38 results found.
Written
Evaluation Data,
Language Type:
Multilingual
Languages:
Arabic Egyptian Arabic English South Levantine Arabic
Availability:
Freely Available
License:
Creative Commons Attribution-ShareAlike 4.0 International Public License
Size:
8988 sentences Production Status:
Newly created-finished
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:The SADID Evaluation Datasets for Low-Resource Spoken Language Machine Translation of Arabic Dialects
-
Paper track:Long paper/
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Wael Abid | The SADID Evaluation Datasets for Arabic Dialects | /N |
Documentation:
Yes. English. Publicly available
Written
Corpus,
Language Type:
Bilingual
Languages:
Arabic Egyptian Arabic
Availability:
From Data Center(s)
License:
LDC
Size:
118568 KByte Production Status:
Existing-used
Use:
Text Normalization
-
Paper title:Phonetic and Visual Priors for Decipherment of Informal Romanization
-
Paper track:Long/Phonology, Morphology and Word Segmentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Maria Ryskina | BOLT Egyptian Arabic SMS/Chat and Transliteration | /N |
Documentation:
NoneLanguage Type:
Trilingual
Languages:
Egyptian Arabic English Mandarin Chinese
Availability:
The Data Will Be Published Via LDC General Catalogue
License:
<Not Specified>
Size:
2709094 words Production Status:
Newly created-finished
Use:
Parsing and Tagging
-
Paper title:Large Multi-lingual, Multi-level and Multi-genre Annotation Corpus
-
Paper track:Written
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country | ||
|---|---|---|---|---|---|
| Author 1 | Xuansong Li | Linguistic Data Consortium, University of Pennsylvania | US | ||
| Author 2 | Martha Palmer | Department of Linguistics and Computer Science, University of Colorado | US | ||
| Author 3 | Nianwen Xue | Computer Science Department, Brandeis University | US | ||
| Author 4 | Lance Ramshaw | Raytheon BBN Technologies | US | ||
| Author 5 | Mohamed Maamouri | <Not Specified> | None | Linguistic Data Consortium, University of Pennsylvania | US |
| Author 6 | Ann Bies | <Not Specified> | None | Linguistic Data Consortium, University of Pennsylvania | US |
| Author 7 | Kathryn Conger | Department of Linguistics and Computer Science, University of Colorado | US | ||
| Author 8 | Stephen Grimes | Linguistic Data Consortium, University of Pennsylvania | US | ||
| Author 9 | Stephanie Strassel | Linguistic Data Consortium, University of Pennsylvania | US | ||
| Main Contact | Xuansong Li | Linguistic Data Consortium, University of Pennsylvania | None |
Documentation:
<Not Specified>
Speech
Corpus,
Language Type:
Monolingual
Languages:
Abidji Achang Achi Achinese Achuar-Shiwiar Acoli Adele Adhola Adioukrou Aguacateco Aguaruna Agutaynen Akan Akawaio Akeu Akha Alangan Albanian Alune Alur Ambai Aralle-Tabulahan Arop-Lokep Arosi Avaric Avatime Avokaya Awa Awadhi Denya Egyptian Arabic Gikyode Hamer-Banna Ivbie North-Okpela-Arhe K'iche' Mesopotamian Arabic Obolo Siwu Southern Altai Standard Arabic Sudanese Arabic
Availability:
Freely Available
License:
CC BY-NC-SA
Size:
None Production Status:
Newly created-finished
Use:
Phonetics
-
Paper title:A Corpus for Large-Scale Phonetic Typology
-
Paper track:Long/Resources and Evaluation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Elizabeth Salesky | VoxClamantis v1.0 | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Monolingual
Languages:
Arabic Bengali Central Khmer Chinese Dari Egyptian Arabic English Georgian Hindi Iranian Persian Italian Japanese Korean Lao Mandarin Chinese Min Nan Chinese Moroccan Arabic Northern Khmer Panjabi Persian Russian Spanish Tagalog Thai Tigrinya Urdu Uzbek Vietnamese Wu Chinese Yue Chinese
Availability:
From Data Center(s)
License:
LDC
Size:
None Production Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:End-to-End Neural Speaker Diarization with Permutation-Free Objectives
-
Paper track:4.5 Speaker diarization/Poster Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yusuke Fujita | 2008 NIST Speaker Recognition Evaluation | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Monolingual
Languages:
Egyptian Arabic English French German Hindi Iranian Persian Japanese Korean Mandarin Chinese Russian Spanish Tamil Vietnamese
Availability:
From Owner
License:
LDC
Size:
46 hours Production Status:
Existing-used
Use:
Language Identification
-
Paper title:Metric learning loss functions to reduce domain mismatch in the x-vector space for language recognition
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2003 NIST Language Recognition Evaluation | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Multilingual
Languages:
Arabic Bengali Dari Egyptian Arabic English Georgian Hindi Iranian Persian Italian Japanese Khmer Korean Lao Mandarin Chinese Min Nan Chinese Moroccan Arabic Panjabi Persian Russian Spanish Tagalog Thai Tigrinya Urdu
Availability:
From Owner
License:
LDC
Size:
640 hoursProduction Status:
Existing-used
Use:
Language Identification
-
Paper title:Modeling and training strategies for language recognition systems
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2008 NIST Speaker Recognition Evaluation | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Monolingual
Languages:
Arabic Bengali Dari Egyptian Arabic English Georgian Hindi Iranian Persian Italian Japanese Khmer Korean Lao Mandarin Chinese Min Nan Chinese Moroccan Arabic Panjabi Persian Russian Spanish Tagalog Thai Tigrinya Urdu
Availability:
From Owner
License:
LDC
Size:
950 hoursProduction Status:
Existing-updated
Use:
Language Identification
-
Paper title:Modeling and training strategies for language recognition systems
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2008 NIST Speaker Recognition Evaluation Training Set Part 2 | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Monolingual
Languages:
Egyptian Arabic
Availability:
Not Available
License:
Size:
1278 sentences tokens Production Status:
Existing-used
Use:
prosodic analysis
-
Paper title:Acoustic cues to topic and narrow focus in Egyptian Arabic
-
Paper track:2.5 Articulatory and acoustic features of prosody/Poster Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Dina ElZarka | corpus, speech | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
Egyptian Arabic
Availability:
Will be Available with the publication of the paper
License:
Size:
1700 sentences Production Status:
Newly created-finished
Use:
Opinion Mining/Sentiment Analysis
-
Paper title:Arabizi Language Models for Sentiment Analysis
-
Paper track:Long paper/
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Gaétan Baert | SALAD | /N |
Documentation:
None




